fix: add eval-before-train to train_async.py (parity with train.py) by Taosheng-ty · Pull Request #1906 · THUDM/slime

Taosheng-ty · 2026-05-13T01:56:15Z

Summary

train.py evaluates the model before training starts (baseline metric), but train_async.py was missing this
Add the same 3-line check to train_async.py, placed after update_weights() and before the first generate.remote()

Condition (matches train.py exactly)

if args.eval_interval is not None and args.start_rollout_id == 0 and not args.skip_eval_before_train:
    ray.get(rollout_manager.eval.remote(args.start_rollout_id))

Only fires when --eval-interval is set
Only on fresh starts (start_rollout_id == 0), not on resume from checkpoint
Skippable via --skip-eval-before-train

Placement

actor_model.update_weights()         ← sglang gets initial weights
check_weight_update_equal            ← verify weights
>>> eval before training <<<         ← NEW: baseline eval with initial weights
rollout_data_next_future = ...       ← first rollout generation starts

Test plan

Verified --skip-eval-before-train arg exists in arguments.py (store_true, default=False)
Verified condition matches train.py line 68
Verified eval runs after update_weights() so sglang has correct weights
Verified skipped on resume (start_rollout_id > 0)

🤖 Generated with Claude Code

For multi-turn agent rollouts where tool-result tokens dominate the response (often >90%), computing log-probs and entropy for all positions wastes memory and compute — those masked positions contribute zeros to the loss anyway. This adds a loss_masks parameter to get_log_probs_and_entropy. When provided (and cp_size == 1), only positions where mask == 1 go through the expensive vocab-parallel softmax. Outputs are padded back to the original response length with zeros so all downstream code (advantages, sum_of_sample_mean, etc.) works unchanged. Typical savings for agentic workloads: - 97% masked tokens → ~30x reduction in softmax compute - Prevents OOM on long multi-turn samples with large tool outputs - Communication in vocab-parallel all-reduces drops proportionally Limitations: - Only active when cp_size == 1 (falls through to unfiltered path for context parallelism > 1) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

train.py runs an evaluation step before the first training rollout when --eval-interval is set and --skip-eval-before-train is not passed. This provides a baseline metric for comparison. train_async.py was missing this, so users had no pre-training eval checkpoint to compare against. Add the same check, placed after update_weights() (so sglang has the correct initial weights) and before the first generate.remote() call. Only fires on fresh starts (start_rollout_id == 0), not on resume. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

tttaosheng and others added 2 commits May 13, 2026 01:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add eval-before-train to train_async.py (parity with train.py)#1906

fix: add eval-before-train to train_async.py (parity with train.py)#1906
Taosheng-ty wants to merge 2 commits into
THUDM:mainfrom
Taosheng-ty:feat/eval-before-train-async

Taosheng-ty commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Taosheng-ty commented May 13, 2026

Summary

Condition (matches train.py exactly)

Placement

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants